Effect of Loop Unrolling in Heterogeneous Multi-pipeline ASIPs
نویسنده
چکیده
Introduction Embedded Systems are now becoming more ubiquitous, pervasive and touching virtually all aspects of daily life. From mobile telephones to automobiles, and industrial equipment to high end medical devices, embedded systems now form part of a wide range of devices. Along with non recurring engineering cost, power consumption, die size and performance are some of the main design challenges of embedded devices. Although, the embedded devices used in real time applications are expected to react fast in time, thus requiring high performance, the designers of such system should always keep an eye of the power consumption and cost of such design. Since embedded systems usually execute a single application or a small class of applications, customization of processors can be applied to optimize for performance, cost, power etc. One popular such design platform for embedded systems is the Application Specific Instructionset Processor (ASIP), which allows such customizability without overly hindering design flexibility. Numerous tools and design systems such as ASIP-meister and Xtensa have been developed for rapid ASIP generation. Usually ASIPs contain a single execution pipeline. Recently however, there has been trend towards having multiple pipelines [1, 7]. In [1], a design system was proposed for ASIPs with varying number of pipelines. Given an application specified in C, the design system generates a processor with a number of heterogeneous pipelines specifically suitable to that application. Each pipeline is customized, with a differing instruction set and the instructions are executed in parallel in all pipelines. Therefore, the numbers of cycles that take to execute a program will potentially go down compared to the single pipeline ASIP, improving the overall performance of the system. This paper describes a way of increasing the performance of an ASIP, called loop unrolling. Loop unrolling is a compiler technique that can be used to reduce the number of clock cycles, which has to be executed in a loop in a program [3, 4]. Even though, loop unrolling is a traditional technique in compiler optimizations, this is the first time it is attempted in a scheduling algorithm of a multi-pipeline ASIP design. The effect of loop unrolling on the performance of a heterogeneous multi-pipeline ASIP is reported in this paper.
منابع مشابه
Instruction Level Parallelism Loop Unrolling
K – Survey of Instruction Set Architectures related to instruction-, data-, thread-, and requestlevel parallelism necessary for understanding Loop unrolling. ILP, Compiler techniques to increase ILP. Register Renaming, Pipeline Scheduling, Loop Unrolling. Conclusion. CPE 731, ILP. 3. Instruction Level Parallelism. 5 Optimizing Program Performance(Loop Unrolling and Enhancing Parallelism ) Michael.
متن کاملAdaptive Distributed Consensus Control for a Class of Heterogeneous and Uncertain Nonlinear Multi-Agent Systems
This paper has been devoted to the design of a distributed consensus control for a class of uncertain nonlinear multi-agent systems in the strict-feedback form. The communication between the agents has been described by a directed graph. Radial-basis function neural networks have been used for the approximation of the uncertain and heterogeneous dynamics of the followers as well as the effect o...
متن کاملUsing the Meeting Graph Framework to Minimise Kernel Loop Unrolling for Scheduled Loops
This paper improves our previous research effort [1] by providing an efficient method for kernel loop unrolling minimisation in the case of already scheduled loops, where circular lifetime intervals are known. When loops are software pipelined, the number of values simultaneously alive becomes exactly known giving better opportunities for kernel loop unrolling. Furthermore, fixing circular life...
متن کاملInstruction Re-selection for Iterative Modulo Scheduling on High Performance Multi-issue DSPs
An iterative modulo scheduling is very important for compilers targeting high performance multi-issue digital signal processors. This is because these processors are often severely limited by idle state functional units and thus the reduced idle units can have a positively significant impact on their performance. However, complex instructions, which are used in most recent DSPs such as mac, usu...
متن کاملRclp: a Novel Approach for Resource-constrained Loop Pipelining Rclp:a Novel Approach for Resource-constrained Loop Pipelining 3
In this paper a novel technique for resource-constrained loop pipelining is presented. RCLP is based on several dependence graph operations: loop unrolling, operation retiming, resource-constrained scheduling, and span reduction. All these operations are focused to nd a minimum length schedule able to be executed with a limited number of resources and thus maximizing resource utilization. Exper...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008